-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: parse date values #150
Conversation
@jakevdp Should we apply this fix in Altair in https://github.com/altair-viz/altair/blob/bd593867f4413936ab723b7583a508aa8c13e06d/altair/utils/core.py? |
Yeah, this could be added to Altair, but storing Python date types in object arrays is really an anti-pattern. Far better and more performant is to use pandas datetimes directly; e.g. df.Year = pd.to_datetime(df.Year, format='%Y') |
vega/utils.py
Outdated
if isinstance(val, np.ndarray): | ||
return val.tolist() | ||
elif isinstance(val, datetime.date): | ||
return "{dt:%Y-%m-%d}".format(dt=val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Date parsing has to be done very carefully at the interface between Pandas and Vega-Lite: these dates should be encoded in ISO format, or else you will run into bugs like this one: vega/altair#1027
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree that having numpy types is strongly preferred over python date types. Although I don't think it is unreasonable to assume that a column of python date be serialized into valid json for plotting.
Adding this support will benefit anyone converting Apache Spark DataFrame to Pandas DataFrame because DateType columns are cast to python dates. You can see in SPARK-23290 the reasoning for this design choice.
I did add an explicit cast to iso format
vega/utils.py
Outdated
@@ -48,7 +48,7 @@ def parse_object_column_type(val): | |||
if isinstance(val, np.ndarray): | |||
return val.tolist() | |||
elif isinstance(val, datetime.date): | |||
return "{dt:%Y-%m-%d}".format(dt=val) | |||
return val.isoformat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, a partial iso format like "2019-01-01"
is not sufficient to ensure that Javascript date parsing is consistent with the expectations of pandas interfacing to Vega-Lite. You need the full iso format, like '2019-01-01T00:00:00'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clearing this up for me i have done more than a couple hacks dealing with this.
Just to check, do we need to add time zone information to the generated string? |
The method used in Altair and highlighted in the vega/altair#1027 does not add a timezone to the iso format string when converting numpy type I did some exploration of this and my dates are plotting as expected so I do not think it is necessary. Although @jakevdp has done more work on this than I have so I will defer to him on this one. |
Can you help finish this pull request @johnmdonich? |
For sure.. It is finished except for a squash. I was just waiting for approval/review from @jakevdp. |
I can squash when I merge. |
@jakevdp @johnmdonich What's left in this pull request? How can I help push it over the finish line? |
@johnmdonich could you redo this pull request? |
Please send a new pull request. |
Add support for
datetime.date
values in pandas DataFrame as described in #70Pandas types the SQL DATETYPE as an Object column with standard library
datetime.date
values. This also comes up when converting an Apache Spark Dataframe with DateType() columns.A simple check for date values in the object
apply()
function is all that is needed.